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HMM MODIFICATION METHOD 

Field of the Invention 

5 The present invention relates to a HMM modification 

method; and, more particularly, to a HMM modification 
method for preventing an overfitting problem, reducing the 
number of parameters and avoiding gradient calculation by 
implementing a weighted loss function as modified 
10 misclassif ication measure itself and computing a delta 
coefficient in order to modify a HMM weight. 

Description of Related Arts 

15 

Hidden Markov modeling (HMM) has become prevalent in 
speech recognition for expressing acoustic characteristics. 
It is statistically based and links a modeling of acoustic 
characteristic to a method for estimating distribution of 
20 HMM which is distribution estimation method. The most 
commonly used method out of these distribution estimation 
methods is the maximum likelihood (ML) estimation method. 

However, in the ML estimation method, it is very 
difficult to find completed knowledge on the form of data 
25 distribution and training data. It is always inadequate in 
dealing with speech recognition. Usually the performance 
of a recognizer is normally defined by its expected 
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recognition error rate and an optimal recognizer is the one 
that achieves the least expected recognition error rate. 
In this perspective, a minimum classification error MCE 
training method based on generalized probabilistic descent 
5 algorithms GPD has been studied. 

An object of the MCE training method is not for 
estimating statistical distribution of data but is for 
distinguishing object data of HMM for obtaining optimal 
recognition result. That is, the MCE training method 

10 minimizes the recognition error rate. 

In a meantime, it has been studied for improving a 
performance of speech recognition by controlling HMM 
parameters such as a mixture weight, mean, standard 
deviation without improved feature extraction, improved 

15 acoustic resolution of acoustic model. As an enhanced 
method of MCE training method, the training of state 
weights has been studied for optimizing a speech recognizer. 
The training method using a state weight uses distinct 
information between speeches in HMM state probability. MCE 

20 is usually performed with ML training method and it 
outperforms estimation of HMM by ML training method. 

Hereinafter MCE training method is briefly explained. 
In a conventional HMM-based speech recognizer, a 
discriminant function of class i for pattern classification 

25 is defined by the flowing equation as: 
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g,(X;A) = log{^,(X,q;A)} 

= |Jlog42,^,+log6j;>(x,)]+log;r<;) ^ 

In Eq. 1, A is a set of classifier parameters, Xis an 
observation sequence, 9 = (^o'^i»— ^^r) the optimal state 

5 sequence that maximizes a joint state-observation function 
for class i, a±j denotes the probability of transition from 
state i to state j. 

bj(Xf) denotes a probability density function of 
observing at state j. In a continuous multivariate 

10 mixture Gaussian HMM, the state output distribution is 
defined as following equation as: 

bj{X,) = Y^Cj„NiX,-Mjn.,i:jJ Eq. 2 

m=\ 

In Eq. 2, iV() denotes a multivariate Gaussian density, 
15 is the mean vector in state j, mixture m and is the 

covariance matrix in stat J, mixture in. 

For input utterance, the decision rule is used. For 
an input utterance X, the class Q is decided as following 
rule defined as: 

20 

C{X) = a if / = arg max gfXX; A) Eq . 3 

In Eq. 3, gjXX;A) is discriminant function of the 

input utterance or observation sequence = (jc,,jc2,...,x^) for 
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the jth model. 

In first, it is necessary to express the operational 
decision rule Eq. 3 in a functional form. A class 
misclassif ication measure, which is a continuous function 
of the classifier parameters A and attempts to emulate the 
decision rule, is therefore defined as following equation 
as : 

dXX;A) = -gXX;A) + loB[^ X cxp[gj(X;A)Tj]f Eq. 4 

In Eq. 4, 7 is a positive constant and N ±s the number 
of A^-best competing classes. For an ith class utterance JT, 
di(X)>0 implies misclassif ication and J,(X)<0 means correct 

classification. 

The complete loss unction is defined in terms of the 
misclassif ication measure using a smooth zero-one function 
as following: 

/,(X;A) = /(t/,(X;A)) Eq. 5 



The smooth zero-one function can be any continuous 
zero-one function, but is typically the following sigmoid 
function as following: 



l(d) = i Eq. 6 

In Eq. 6, 0 is usually set zero or slightly smaller 
than zero and r is a constant. Finally, for any unknown X , 
5 the classifier performance is measured by following 
equation as: 



/(X;A) = Z//(X;A)l(XeC,-) Eq. 7 

1=1 
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In Eq. 1, 10 is the indicator function. 

The optimal classifier parameters are those that 
minimize the expected loss function. The generalized 
probabilistic descent GPD algorithm is used to minimize the 
15 expected loss function. The GPD algorithm is given by 
following as: 

A„.,=A„-^„C/„V/(Jr;A)U, Eq. 8 

20 In Eq. 8, U is a positive definite matrix, f„ is the 

learning rate or step size of adaptation, and A„ is the 
classifier parameter set at time n. 

.The GPD algorithm is an unconstrained optimization 
technique. But some constrains must be maintained for HMMs 

25 so some modifications are required. Instead of using a 



complicated constrained GPD algorithm, Chou et al, applied 
GPD to transform HMM parameters. The parameter 

transformations ensure that there are no constraints in the 
transformed space where the updates occur. The following 
5 HMM constraints should be maintained in the original space. 
The HMM constraints are expressed as: 



5]^-a^=l and a,j>0, X*^y*=l ^jk^O, cTj,,>0 Eq. 9 
The following parameter transformations should be 
10 used before and after parameter adaptation. 



a^j^Qij where a,j = e "'HQ^j^e'"') 

--^Cik where = e '"/(^^e^'*) 

_ ^ Eq. 10< 



As mentioned above, GPD algorithms based MCE training 
15 method requires to calculate of gradient for parameters of 

HMM and to perform obtainment of optimal state sequence. 

Such a calculation of gradient and obtainment of the 

optimal state sequence cause huge amount of calculation. 

Moreover, the above mentioned HMM state probability 
20 modification method produce overfitting problem as the 

training data is iteratively used for adjusting the 

misclassif ication measure. 
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Summary of the Invention 

It is, therefore, an object of the present invention 
to provide a HMM modification method for reducing 
5 recognition error rate by eliminating obtainment of optimal 
state sequence and gradient calculation 

It is another object of the present invention to 
provide a HMM modification method for decreasing amount of 
calculation by eliminating gradient calculation. 
10 It is still another object of the present invention 

to provide a HMM modification method for reducing the 
number of parameters by implementing a weight corresponding 
to each HMM to thereby improve the performance of speech 
recognition . 

15. It is further still another object of the present 

invention to provide a HMM modification method for 
preventing overf itting problem of the ^ training data by 
using enhanced loss function. 

In accordance with an aspect of the present invention, 

20 there is provided a HMM modification method, including the 
steps of: a) performing Viterbi decoding for pattern 
classification; b) calculating misclassif ication measure 
using discriminant function; c) obtaining modified 
misclassif ication measure for a weighted loss function; d) 

25 computing a delta coefficient according to the obtained 
misclassif ication measure; e) modifying HMM weight 
according to the delta coefficient; and f) transforming 
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HMM weights for satisfying a limitation condition. 

In accordance with another aspect of the present 
invention there is provided a HMM modification method 
including a step of obtaining modified misclassif ication 
measure by using the weighted loss function di{X\K) which is 

A) = A)-^-^,(^; A) 

defined as: 1 ^ - , 

=-(l + *>^,(^;A) + log[- X exp[g,(jr;A)7]]'' 

wherein i and j is positive integer number and i 
representing a number of class, g^{X\K) is the discriminant 
function for class I with A being a set of classifier 
parameters and X is an observation sequence, N is an 
integer number representing class models and k is positive 
number representing the number of HMM state. 

In accordance with still another aspect of the 
present invention there is provided a HMM modification 
method including a step of computing a delta coefficient 
Aw,. , which is obtained based on a discriminant function and 
the weight loss function defined and is defined 

di(X;A) , . 

as:Aw-= , wherein ^.(X;A) is the weight loss function 

for class i and g^(X;A) is the discriminant function, A is a 
set of classifier parameters, X is an observation sequence, 
i is positive integer number representing a number of class. 
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Brief Description of the Drawing (s) 

The above and other objects and features of the 
present invention will become apparent from the following 
5 description of the preferred embodiments given in 
conjunction with the accompanying drawings^ in which: 

Fig. 1 is a flowchart of a HMM modification method in 
accordance with a preferred embodiment of the present 
invention . 
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Detailed Description of the Invention 

Other objects and aspects of the invention will 
become apparent from the following description of the 
15 embodiments with reference to the accompanying drawings, 
which is set forth hereinafter. 

For helping to understand a HMM modification method 
in accordance with the present invention, a fundamental 
concept of the HMM modification method is explained at 
20 first. 

The HMM modification method adjusts HMM weights 
according to misclassif ication measure and iteratively 
adapts adjusted HMM weights to a pattern classification in 
order to minimize classification error. 
25 An input utterance is classified by its pattern by 

using a discriminant function. During classifying pattern, 
a HMM weight is applied to each HMM. For applying the HMM 
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weight to each HMM, output score of HMM is expressed as 
multiplication of HMM output probability value and the HMM 
weight by using viterbi decoding method. For mathematical 
explanation, it is assumed that M number of HMMs is set up 
as basic utterance recognition unit and each basic 
utterance recognition unit is consisted with j number of 
HMM, A pattern recognition based on HMM is performed by 
using a class decision rule with the discriminant function 
of class i. The discriminant function of class i is 
expressed by Eq. 1. Similarly, the discriminant function 
of class i in the present invention is expressed by 
following equation defined as: 



In Eq. 11, w,- is the HMM weight for class i. A 
summation of HMM weights in a HMM set are limited by total 
number of HMM as shown in below equation as: 



By the limitation, a recognition algorithm based on 
N-best string model obtains identical result when the HMM 
weight are initially set to 1. It is because smoothly 




Eq. 11 



T 



.(0 
9o 



Eq. 12 
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performing recognition process without huge variation of 
probability value caused by conventional parameter 
estimation method and viterbi searching algorithm. 

After classification pattern of input utterance, a 
misclassif ication measure is calculated. In the present 
invention, weighted loss function is implemented as 
misclassif ication measure. That is, the misclassif ication 
measure between training class model and N class models is 
expressed as: 

1 - Eq- 13 

For the first time, the misclassif ication measure is 
modified by adding a weighted likelihood of correct class 
to the misclassif ication measure. This modified 

misclassif ication measure can be inserted into a sigmoid 
function to produce the sigmoid zero-one loss function. 
However, in the present invention, a misclassif ication 
measure is considered as a loss function to produce the 
linear loss function. By using this loss function, 
gradient associated with a loss function is increased for 
correct string by a uniform factor k while not affecting 
the gradient associated with a loss function for incorrect 
string as shown in Eq. 13. 

As a result of modified misclassif ication measure. 



another loss functions are sigmoid zero-one loss function 
where a modified misclassif ication measure is inserted into 
a sigmoid function, weighted linear loss function that is 
exactly the same as a misclassif ication measure. 
5 After misclassif ication measure, a delta coefficient 

is obtained for modified HMM weight. 

For controlling the HMM weight for class i, the 
quantity for adapting HMM weights of class i needs to be 
set. the quantity for adapting HMM weights of class i is 
10 defined as delta coefficient and it is represented by Aw^ . 
By using value of discriminative function di{X;A) for class 
i and misclassif ication measure gi(X;A) , the delta 
coefficient is expressed as below equation as: 



15 Aw.= ^ ^ Eq. 14 

' -gKX;A) 



By using the delta coefficient, a training of HMM 
weight for class i having 1 as initial value is repeatedly 
performed according to below equation as: 



20 



+ = w^(«)-f„»w.(w>Aw,. Eq. 15. 



Finally, the training of HMM weights is performed by 

using the Eq. 15 and HMM weights are transformed after HMM 

25 weight training. The transformation of parameters is 

performed by following equation as: 
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Wj Wj where w 



Eq. 16 
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For satisfying the limitation condition that a 



summation of HMM weights in a HMM set must be equal to 
total number of HMM in the HMM set^ Eq. 16 is applied to 
HMM weight. 

In Eq. 16, w,- is a HMM weight of class i of 
10 transformed space corresponding to HMM weight w. for class 
i of original space. 

Also, a recognition algorithm for continuous speech 
recognition performs calculation with considering each HMM 
weight for viterbi searching step. The recognition 

15 algorithm is defined as: 



In Eq. 11, is an accumulated score at state j 

20 in time t. means initial state and Hj^ means HMM. 

\ogbj(x^) is log probability value when observing an observe 

vector and HMM weight of ]c^^ HMM. 

Fig. 1 is a flowchart of a method for modifying HMM 
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VVnin = max [V[t - 1][;^] -h w(/i).{log a,,} J + ^OO-log bj{x,) 
>v(7) = Wj, if JeH^, A: = 1,2,A,M 



Eq. 17 



weights in accordance with a preferred embodiment of the 
present invention. There is an assumption that a class i 
is consisted wit k HMMs for training utterance. 

Referring to Fig. 1, at first, utterances are 
5 inputted for speech recognition at step SllO. For 
continuous speech recognition, viterbi decoding is 
performed for computing a discriminant function of each HMM 
at step S120. After computing the discriminant function, a 
misclassif ication measure is obtained according to the 

10 discriminant function at step S130. As mentioned above, 
the modified misclassif ication measure is used as the 
weighted loss function or inserted to sigmoid function for 
signmoid zero-one loss function. By using the 

misclassif ication measure Eq. 13 for obtaining the 

15 weighted loss function, the overfitting problem of 
conventional method can be prevented. 

If the misclassif ication measure is a positive number 
at step S140, a delta coefficient Aw^ is computed based on 
the discriminant function Eq. 11 and the weight loss 

20 function Eq. 13. That is, the delta coefficient Aw,, is 
defined by Eq. 14 and is computed for controlling a score 
for training data in order reduce misclassif ication measure 
at step S150. 

After computing the delta coefficient, the HMM weight 
25 is modified according to the delta coefficient at step S160. 

That is, the delta coefficient is reflected to each 
HMM weight in a training class. The HMM weights in the 



training class are modified according to below equation as: 
wV{n-^\)^wf{n)-s„^w^\ny^wU k = h2,A,K Eq. 18 

5 In Eq. 18, wi'^ is a weight of k*'^ HMM in class I, Aw/ 

is a delta coefficient of class i. Also, €„ is ration of 
study in n^^ training. 

After modifying the HMM weight, classifier parameters 
is transformed for satisfying a limitation condition for 
10 HMM weight at step S170 by following equation as: 



wk — > wk where wk = g"*"* / y^ e 



Eq. 19 



The transformed classifier parameters are implemented 
15 to step S120 for better recognition performance. 

If the misclassif ication measure is not positive at 
step S140 then it is returned to the step SllO for 
receiving new utterance. 

As mentioned above, the present invention can prevent 
20 overfitting problem for training data by implementing a 
weighted loss function for misclassif ication measure. 
Furthermore, the present invention can reduce the number of 
parameters to estimate and avoid gradient calculation by 
computing a delta coefficient and modifying a HMM weight 
25 according to the delta coefficient to thereby reducing 
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computation amount of speech recognition. 

While the present invention has been described with 
respect to certain preferred embodiments, it will be 
apparent to those skilled in the art that various changes 
5 and modifications may be made without departing from the 
scope of the invention as defined in the following claims. 
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